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Abstract — An increasing number of retail energy markets show 
price fluctuations, providing users with the opportunity to buy 
energy at lower than average prices. We propose to temporarily 
store this inexpensive energy in a battery, and use it to satisfy 
demand when energy prices are high, thus allowing users to 
exploit the price variations without having to shift their demand 
to the low-price periods. We study the battery control policy that 
yields the best performance, i.e., minimizes the total discounted 
costs. The optimal policy is shown to have a threshold structure, 
and we derive these thresholds in a few special cases. The cost 
savings obtained from energy storage are demonstrated through 
extensive numerical experiments, and we offer various directions 
for future research. 

Index Terms — Battery storage, dynamic pricing, dynamic pro- 
gramming, energy storage, Markov decision processes, threshold 
policy. 



I. Introduction 

Wholesale energy prices exhibit significant fluctuations dur- 
ing each day due to variations in demand and generator 
capacity. End users are traditionally not exposed to these 
fluctuations but pay a fixed retail energy price. Economists 
have long argued to remove the fixed retail prices in favor 
of prices that change during the day. Such dynamic pricing 
would better reflect the prices on the wholesale market and 
has been predicted to lead to lower demand peaks and to a 
lower and less volatile wholesale price J2]- Implementations 
of dynamic pricing have been enabled by recent developments 
in smart-grid technology such as smart meters. 

An example of an increasingly popular dynamic pricing 
scheme is time-of-use pricing. Such pricing typically provides 
two or three price levels (e.g., 'off-peak', 'mid-peak' and 
'on-peak') where the relevant level depends on the time of 
day. The price levels are determined well in advance and are 
only changed once or twice per year. A second example of 
dynamic pricing is real-time pricing where the retail energy 
price changes hourly or half-hourly and are based on the price 
on the wholesale energy market. 

Dynamic pricing creates an opportunity for users such as 
households and data centers to exploit the price fluctuations 
and reduce energy costs. However, doing so would require 
users to shift their demands to low-price periods, and in 
practice users only show a minor shift in their demand in 
response to changes in the energy prices j3]-(]6]. A possible 
solution is to equip users with a battery that can be used for 
energy storage; the battery can be charged when the energy 
price is low and the stored energy can then be used when the 
price is high. This allows users to benefit from the energy price 
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variations without having to adjust their consumption. Energy 
can be stored both by a dedicated battery, or by existing storage 
such as the battery pack of an electric car Q (residential users) 
or a backup power supply (8] (data centers). 

In this paper, we address the problem of organizing energy 
storage purchases to minimize long-term energy costs under 
variable demands and prices. This problem involves deciding 
whether to satisfy demand directly from the grid or from the 
battery, as well as up to what level to charge or discharge the 
battery. The resulting optimization problem is complicated by 
the stochastic nature of price and demand and due to the fact 
that we aim to minimize the long-term costs. We model the 
problem as a Markov Decision Process and show that there 
exists a two-threshold stationary cost-minimizing policy. When 
the battery level is below the lower threshold, the battery is 
charged up to it, and the battery is discharged when above the 
upper threshold. By comparing the costs incurred under this 
policy with the cost of satisfying all demand directly from the 
grid, we can show that energy storage may lead to significant 
cost savings. 

Residential energy storage has been studied for the case 
of arbitrage, i.e., buying energy when it is inexpensive, and 
selling it later to the grid for a higher price [0. This problem 
has been studied assuming that prices are known in advance in 
a finite horizon setting. These assumptions allow determinis- 
tic optimization problem formulations which can be solved 
using linear programming techniques irTO) . iTTTI . However, 
such approach does not take into account the stochasticity in 
prices and demands and does not allow for long-term cost 
optimization. In lfl2l the authors consider the problem of 
energy storage from the point of view of the grid operator, and 
propose a threshold policy that is shown to be asymptotically 
optimal as the size of the storage unit increases. 

A model similar to ours was used to investigate control 
of energy storage in the context of data centers Jl). The 
model in J8) assumes that the battery is fully efficient and 
the proposed scheduling algorithm is a sub-optimal heuristic, 
whose gap from optimality increases as storage size decreases. 
In lfl3l this approach is extended to multiple data centers, 
each with different time-varying prices. In contrast to fl8] 
and iPTD . our model incorporates battery inefficiencies and we 
investigate the optimal scheduling policy. 

A related problem is optimal control of energy storage for 
renewable energy. The two-price case is considered in lfl"4l . 
while the more general case is discussed in 0151 and ||T6l . 
The case without transmission costs in lfT31 is closely related 
to our setting, and a similar two-threshold policy is shown to 
be optimal. In recent work 0161 . the authors consider a similar 
model to ours to address storage control for renewable energy 
generation for general prices, and show that the average cost 
optimal policy has a similar threshold structure. The authors 



of lfT3Tl and Ifl6l use a finite-horizon and infinite horizon 
average criterion, respectively. The battery and price models 
differ slightly, and the work in |[T6l accounts for dissipation 
losses, but does not allow state-dependent charging constraints. 
In contrast to lfl6l . our approach incorporates periodicity, 
which allows us to model daily fluctuations in price and 
demand. 

Our model is closely related to periodic-review, single-item 
inventory models, and the optimal policy mirrors the optimal- 
ity of the base-stock policy for the backlog case. However, in 
our case the demand is known before the purchasing decision, 
and we require that all demand is met in the time slot that it 
arises. In addition, the state description is continuous, and the 
battery inefficiency fundamentally changes the structure and 
analysis of the optimal policy. 

The remainder of the paper is structured as follows. In 
Section [H] we introduce the model and describe the decision 
variables. In Section |ni] we demonstrate the optimality of a 
threshold policy, and in Section [IV] we derive some properties 
of the thresholds. Section [V] discusses various numerical 
examples and Sections [Vll and lVTIl describe some directions for 
future research and provides concluding remarks, respectively. 

II. Model 

Consider a user with certain energy requirements and a 
battery that can be used for energy storage. Time is slotted, 
and we denote by B(t) the battery level (state of charge) in 
kWh at time t, t = 0, 1, ... . Let B represent the maximum 
battery level, and B = [0, B] the range of all possible battery 
levels, so 

B(t) G B. (1) 

In each time slot t some demand D(t) arises, and we may 
purchase energy at a price of P(t) per unit. The demand 
has some compact support D(t) G T>, as does the price 
P(t) G V. Both are bounded, and we denote by D and P 
the maximum demand and price, respectively, so T> C [0, D] 
and V C [p m ;„, P], with p m ; n the minimum price. Finally, we 
denote by M. the set of modulating states, that influence the 
price and demand transitions. For example the time of day or 
the season. 

Denote byfl = T>x'PxA / l the set of possible realizations 
of demand, price and modulating state, and for any x G 0, 
denote by d(x) and p(x) the corresponding price and demand, 
respectively. Demand and price may be correlated, and we 
denote by f x (y) the probability density function of moving 
from state x to state y in the next slot, for any x, y G SI. 

The battery may not be completely efficient, and its per- 
formance is affected by the charging efficiency i] c G (0, 1] 
and discharging efficiency rjd G (0, 1]. Energy purchased to 
charge the battery is reduced by a factor r\ c , and only a fraction 
r\d of the discharged energy is converted into electricity. This 
model is general and encompasses for example batteries of 
electric vehicles, uninterrupted power supplies of data centers 
and batteries dedicated to end user storage. The model is not 
intended to capture all the subtleties of battery behavior in 
each of these applications but rather the essential tradeoffs 
and phenomena that arise in practice. In Section [VI] we 



discuss various model refinements that can be made without 
fundamentally altering the results or the derivations. 

In addition to satisfying the demand from the battery, we 
also allow demand to be met directly from the grid, bypassing 
the battery. Let A\(t) denote the amount of energy purchased 
directly from the grid in slot t, A 2 (t) the amount of energy 
bought to charge the battery, and A 3 (t) the energy discharged 
from the battery used towards satisfying demand, see Figure Q] 
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Figure 1 . A graphical representation of the model. 



We assume 



A 1 (t),A 2 (t),A 3 (t)>0, 



and require that all demand must be met, i.e., 
D(t)=A 1 (t)+r )d A 3 (t). 



(2) 



(3) 



The battery has state-dependent charging constraints A c (b) and 
Ad (b), so the amount of energy that can be charged to and 
discharged from the battery is bounded as 



A 2 (t) < A c (b), A 3 (t) < A d (b). 



(4) 



The battery level of the battery evolves according to 

B(t + l) = B(t)+r lc A 2 (t)-A 3 (t), (5) 

and the energy costs in slot t are given by 

g(t) = (A^t) + A 2 (t))P(t). (6) 

We are interested in the total discounted costs Y^tLodifyo?, 
with < a < 1 the discount factor that represents value 
reduction over time. The reason for considering discounted 
rather than average costs is to emphasize early decisions 
and costs, in order to emulate the effect of reduced battery 
efficiency over time. Note that the total discounted costs are 
finite, since the per-slot costs are bounded. Our goal is now 
to choose in each slot the A\(t), A 2 (t) and A 3 (t) that solve 
the following optimization problem: 



minE{^g(t)a*} 



t=o 



subject to : CD, ©, ©, ©, © 



(7) 



The infinite horizon problem belongs to a class of stochastic 
optimization problems, which in general are difficult to solve. 
A first step towards dealing with © is to see that it is 
never optimal to charge and discharge the battery in the 
same slot. This is intuitively clear, because charging and 
discharging the battery simultaneously corresponds to routing 
mm{A 2 (t) , r)d,A 3 (t)} energy from the grid to the user, through 
the battery. Because of the battery inefficiency it is beneficial 
to instead circumvent the battery, and satisfy the demand 
directly from the grid. 



It turns out that this observation significantly simplifies the 
optimization problem, by reducing the number of decision 
variables. Specifically, in view of the restriction on simultane- 
ous charging and discharging: 

A x {t) = D(t) + (B(t + 1) - B(t))r]dl{B(t+i)<B(t)}, 
A 2 (t) = (B(t + 1) - J B(i))r^ 1 l {B(t+1)>B(t)} , 
A 3 (t) = -(B(t + 1) - B(t))r ld l {Bit+1)<B{t)} . 

Thus, the choice for B(t + 1) fixes A x {t), A 2 (t) and A 3 (t), 
and (0 reduces to a single-variable decision problem. The 
per-slot costs may be rewritten in terms of B(t + 1) as 

g(t) = (D(t) + (B(t + l)-B(t))+ V ^ 
+ {B(t + l)-B{t))~ m )P(t), 

with (x) + = max{cc,0} and (x)~ = — max{- x,0}. Note 
that <j4j can be expressed in B(t + 1) as 

-MBit)) < B(t + 1) - B{t) < A c (B{t))r lc . 

III. The structure of the optimal policy 

In this section we discuss how to choose in each slot 
the value for /3 = B(t + 1) that minimizes the total 
discounted costs. To this end, we rewrite our model as a 
Markov decision process. We denote by J x (b) the minimal 
total discounted costs for a battery differential 5, starting 
from state x G £1, and battery level b G B. Let 7 X ((5) = 
(d(x) + (<5) + ?7 f r 1 + (6)~ i]d)p(x) denote the immediate costs, 

G a (0)= J U(v)J v (P)dy and define 

yen 

H x ((3,b)= lx (f3-b)+aG x (f3), 

the total discounted costs given battery level b, state x and 
action /3. Then the cost function satisfies the Bellman equation 



J x (b)= inf H x (P,b), 
Peu^b) 



(8) 



with U x (b) the control set that contains all feasible decisions 
for B(t + 1). This set may be written as 

U m (b) = [U-(b),U+{b)], 

where U~(b)_= max{0, b - d(x),b - A d (b)} and U+(b) = 
mm{B,b + A c (b)}. 

It is readily verified that the infimum in (H) can be attained 
by a stationary optimal policy, see, e.g., lfT7l Proposition 4.4]. 
We shall demonstrate that this optimal policy specifies for each 
state x G O two battery thresholds /3~ , /3+ G B, j3~ < /3+ 
such that the cost-minimizing choice for the battery level /?*(&) 
is given by 



im{(3-,U+(b)}, b<(3 a 



&(&)=< max{/?+, [/"(&)}, b>f3+ 



b. 



otherwise. 



(9) 



So if b < f3~, then the optimal policy is to charge the battery 
as close to f3~ as the control set allows. If b > /3+, then the 
battery should be discharged down to /3+ within the boundaries 
of the control set. In case /3~ < b < /3+, it is optimal to 
neither charge nor discharge the battery. So if the battery level 
is sufficiently low, the battery is charged, and all demand in 



satisfied directly from the grid, while for high battery level, 
demand is (partially) satisfied from the battery. When the 
battery level is between both thresholds, the battery is neither 
charged nor discharged, and all demand is met from the grid. 

In order to show that © indeed describes the structure of 
the optimal policy, we require the following lemma. 

Lemma 1: The cost function J x (b) is convex and non- 
increasing in b. 

The proof of Lemma Q] relies on the fact that the total 
discounted costs can be viewed as the limit of finite-horizon 
discounted costs, which may be shown to possess these 
properties by induction on the horizon. The proof can be found 
in the appendix, along with the other proofs. 

We are now in position to state and prove our main result. 

Theorem 1: The policy that solves the minimization prob- 
lem IT) is of the form /3 = /3* as in ©. 

By Lemma Q] we know that the right-hand side of the 
Bellman equation (HJ in fact defines a convex optimization 
problem, the solution of which can be found by finding the 
(3 G U x (b) for the which the subdifferential dH x (f3,b) has 
the proper shape. The proof of Theorem Q] then relies on close 
inspection of this subdifferential. 

In case the battery is fully efficient, the charging threshold 
and discharging threshold are identical, as is shown in the next 
corollary. 

Corollary 1: Let r\ c = rjd = 1, then the optimal policy is 
as in ©, with /3~ = /3+. 

In Section [VT] we present several model extensions under 
which Theorem Q] remains valid. 

The optimal policy © is illustrated in Figure |5] which 
shows /3*(6) plotted against b. The diagonal segment in the 
middle corresponds to /3*(6) = b, while the two horizontal 
lines represents the thresholds (3~ and j3 x . The outer diag- 
onal segments represent the boundaries of the control space 
U x (b). Both horizontal lines coincide in the scenario with a 
completely efficient battery (r\dJ]c = !)■ 



U+(b) 




Figure 2. The structure of the optimal policy /3* as a function of b. 



IV. Battery level thresholds 

In the previous section we have established the threshold- 
based structure of the optimal policy (|9). Analytically deter- 



mining the thresholds j3~ and /3+ is a difficult problem in 
general, and in this section we present some structural results 
and results for special cases. For simplicity, we limit ourselves 
to the case r\d = i] c = 1, and denote f3 x — (3 X = /3+. Similar 
properties can be shown to hold for the inefficient case. 

We first present sufficient conditions for the thresholds to 
be equal to either or B. 

Proposition 1: Let x € D,, If for all 61,62 <E B, b\ < b 2 , 
yefl 

Jy(6x)-J,(6a)<(6a-6i)— , 

a 

then fi x = 0. If for all bi,b 2 e B, h < b 2 , y e fl 

J y (h) -Jy(b 2 )> (62-bi) — , 

a 

then /3 X = B. 

This result can for example be used to show that f3 x = 
in certain special cases, as is described in the following 
proposition. 

Proposition 2: Let x G £1 and denote p min = min pg -pp, 
then (3 X = if either: 
(i) p(x) = P; 
(ii) Pmin > and a < p min /P. 

Proposition [2] states that if the price is very high, or the 
discount factor is sufficiently low, it is optimal not to charge 
the battery at all, and to try to satisfy the demand from the 
battery as much as possible. 

In case that the state transitions are i.i.d. (f x = /) or if 
the transition probabilities are determined by the price level 
(f x = / p ( x )), the thresholds f3 x depend on the price only and 
are independent from the demand and modulating state. In this 
case, write (3 X = f3 p ( x ). We can show that for i.i.d. prices, the 
thresholds are decreasing in the price level. 

Proposition 3: Assume that the prices and demands are 
i.i.d. across time, then (3 p ( x ) is decreasing in p. 

The monotonicity observed in Proposition[3]is very intuitive, 
but does not extend to Markovian prices. The reason is that 
threshold values are partially determined by the evolution of 
the price: even for a low price it might be beneficial to set a 
low threshold, if the price in the next slot is even lower. Such 
an example is presented below. For ease of presentation we 
use a discrete probability distribution / in this example, noting 
that similar examples can be constructed using continuous 
transition probabilities. 

Example 1: Consider an example with four price levels 
Pi = i, i = 1,...,4. We assume that a > 3/4, B = 1 
and D = 1, and we choose the following price transition 
probabilities: 

/i(l) = /i(3) = 1/2, / 2 (1) = / 8 (4) = / 4 (2) = 1. 

The corresponding thresholds are j3i = ^3 = 1 and p\ = 
/?4 = 0, which are clearly not monotone. The derivation of 
these thresholds is presented in Appendix [G] 

V. Numerical evaluation 

In this section we evaluate the operation of the optimal 
energy storage management policy (O in residential envi- 
ronments and real-time pricing (RTP) scenarios. Our goal is 



to demonstrate the practical feasibility of the optimal policy 
and the extent of its cost savings under RTP and demand 
fluctuations that might arise in real life. We first describe 
the price and demand datasets used for the evaluation and 
then outline our low complexity implementation of the optimal 
policy. We evaluate the cost savings of the optimal policy in 
scenarios of individual home storage units and shared energy 
storage. 

A. Price and demand datasets 

We emulate residential RTP data with historical hourly 
spot prices of the Ontario energy market Ifl8l . Although 
Ontario currently does not use RTP, the spot prices provide 
a reasonable RTP estimate. The residential demand data is 
synthetically generated using the tool in |fl9l . This tool uses a 
high-resolution model of domestic electricity usage based on 
patterns of home occupancy and appliance usage, weather con- 
ditions and characteristics of all major appliances commonly 
found in the domestic environment. 

In our approach, the optimal thresholds can be dynamically 
determined using empirical distributions of historical price and 
demand data for each time slot of the day. The smallest time 
slot is determined by the coarsest granularity between price 
and demand data. In our experiments, we have hourly price 
data and 1 -minute demand data, therefore we use time slots 
of one hour. We also use a training window of one month. 
We found that this window size is long enough to provide 
adequate characterization of the distribution of each hour of 
the day, and short enough to use the optimal thresholds for 
the prices and demands that appear in the next window. 

For concise presentation, we show results for a representa- 
tive scenario where a home of four occupants equipped with a 
battery receives hourly prices from the Ontario energy market 
during January and February 2011. 

Figure |3(a)| plots the average, minimum and maximum 
hourly price in Ontario during January 2011. we observe one 
active phase with multiple price peaks between 9 a.m. and 10 
p.m., and a low price at night. Note that the energy prices are 
lower at night and display multiple peaks per day. Figure |3(b)| 
shows that the February prices follow a similar trend. 





(a) January 



(b) February 



Figure 3. Hourly average, minimum and maximum Ontario spot market 
prices (Ontario, 2011 dataset). 

We first determine the empirical distribution of the January 
2011 Ontario hourly energy prices and demand data. The 
empirical price (demand) distribution for each hour of the 
day is computed by dividing the number of observations of 
a specific price (demand) level by the total number (31) of 
observations. These distributions are used to determine the 



thresholds of the optimal policy. Then, we use the February 
2011 prices and demands to emulate the operation of the 
optimal policy and compute the resulting electricity cost. 

B. Implementation 

Analytical determination of the optimal thresholds is possi- 
ble for special cases but is difficult in general (see Section HvT ). 
Instead, we compute the thresholds numerically using policy 
iteration. Since policy iteration can be computationally in- 
tensive in practice, we have reduced the complexity of our 
implementation as follows. First, we discretize the state space 
by rounding the demand data and energy storage level to 
multiples of 0.5 kWh, and the price data to multiples of 5 ct. 
Second, we use the hour of the day as the modulating state 
and assume that prices and demand are independent from hour 
to hour (but not necessarily identically distributed). Thus, the 
optimal thresholds depend on price realization and the hour of 
the day, but are independent of the demand realization. The 
modulating variable M. could also be used to differentiate 
between different days and months and to take into account 
weather conditions. Such detailed state description would 
likely yield higher cost savings as it would allow more accurate 
price and demand predictions. On the other hand, it might 
generate too large a state space for policy iteration to be 
feasible. The following experiments demonstrate that using 
our simple choice of the modulating state yields very high 
cost savings. 

We evaluate the optimal policy for a fully efficient battery 
(Tld = Tic = 1) and no charging constraints (A c (b) = Ad(b) = 
oo) thus obtaining an upper bound on the potential cost 
savings of energy storage. We implement the policy iteration 
algorithm in Matlab; each slot in the policy iteration algorithm 
corresponds to one hour. For the above parameters and a 
discount factor a = 0.99, a laptop PC with a quad-core 
2.2 GHz Intel processor and 8 GB RAM typically requires 
approximately 5 min to compute the optimal thresholds. 

C. Energy savings 

Figure [4] shows the relative energy savings of the optimal 
policy over the setting without storage, as a function of the 
battery size B mllx . Three important observations are in place. 
First, the savings increase with battery size to up to 38%, 
which is significant. Second, the savings reach their maximum 
at £>max = 16 (kWh); increasing battery size beyond this point 
does not increase savings, as the optimal policy will not utilize 
any battery capacity beyond 16 kWh. This saturation point 
can be explained by the fact that the value of stored energy 
decreases over time due to the discounting of the costs and 
the cyclic price and demand levels. Third, the size of a typical 
hybrid vehicle battery pack is in the order of 16 kWh |0. 
This suggests that car battery packs are well-suited for home 
energy storage since their size corresponds to the amount of 
storage required by the optimal energy storage policy. 

D. Structure of the optimal policy 

In the remainder of this section we consider a battery with 
capacity of 16 kWh. To illustrate how the optimal storage 
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Figure 4. Energy cost savings of optimal storage (over no energy storage) 
vs. battery size (Ontario, February 2011 dataset). 



policy works, we show in Figure [5]the thresholds as a function 
of time, for two price levels (15 ct and 25 ct). We observe that 
the thresholds peak early in the morning when the price is low. 
Moreover, the lower 15 ct price level yields lower thresholds 
at all hours, in accordance with Proposition [3] Due to low 
demand and several consecutive hours with low prices, the 
15 ct threshold drops to zero at 1 1 p.m., and increases again 
the next hour. 




11 13 15 17 19 21 23 



time (hour) 



Figure 5. Optimal thresholds vs hour of day, for 15 ct/kWh (black) and 
25 ct/kWh (gray) price levels. 

Figure [6] shows the average amount of energy bought, 
plotted against the hour of the day, with optimal energy storage 
(black line) and without storage (gray line). We observe that 
storage allows users to purchase energy early in the morning 
when the price is low, while users without storage are forced 
to buy energy during peak hours, at higher prices. 

E. Resource pooling 

An alternative to energy storage for individual households 
is to pool storage capacity: rather than individual users each 
having a small battery, further cost savings might be achieved 
by multiple users sharing a single large battery. Figure [7] 
compares the case of n users each with a 16 kWh battery 
to the case of a shared 16 x n kWh battery. We use the tool 
from |fl9l to generate distinct demand data for each individual 
home user. In the scenario with storage but without pooling, 
each home has its own storage, and each set of demand 
data corresponds to one energy storage unit. In this case, 



bought (kWh) 
2.5 
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Figure 6. The average amount of energy bought, with (black) and without 
(gray) storage. 



the thresholds are computed for each individual user. In the 
shared storage scenario, the aggregate demand data of all n 
homes is input to the shared storage unit. Since this scenario 
only has a single storage unit (of size 16n), we only have 
to compute the optimal thresholds once. Figure [7] compares 
the aggregate monthly costs without storage (black), storage 
without pooling (gray) and storage with pooling (blue) plotted 
against the number of users. 

costs (ct) 
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Figure 7. The effect of resource pooling. 

We observed similar performance by running the same 
experiment with artificially time-shifting the demands of these 
users to be negatively correlated (results not shown for 
brevity). This can be explained by the fact that the prices 
are the same for all users. Thus, irrespective of the size of 
the battery, the behavior of the optimal policy is primarily 
influenced by the common pricing signal, eliminating the 
potential benefits of pooling. 

VI. Extensions & Outlook 

Our model is a first step on energy storage management 
and focuses on a single user that utilizes storage to minimize 
its own costs. The ideas and results can also be applied in 
multi-user settings where the objective is to minimize the grid 
peak load. For example, in a scenario with a large population, 



a certain fraction of battery-equipped users could apply the 
optimal control policy © to shift their demand from peak to 
off-peak hours. Assessing the impact of such an approach is 
an interesting topic of future research. 

The model can be extended in several directions to cover 
a broader variety of applications, and more realistic battery 
models. Below we briefly describe three such extensions. 

A. Battery replacement costs 

In order to assess the cost effectiveness of energy storage, 
one has to incorporate battery replacement costs. Determining 
the optimal policy in this case requires solving a joint problem 
of battery dimensioning and energy storage management for 
cost minimization or battery lifetime maximization. Smaller 
batteries are cheaper but, as shown in Figure |4] they may 
provide less opportunity to exploit price fluctuations. The 
optimal size of the battery will most likely depends on the 
spread and volatility of energy prices and may differ between 
energy markets ||20) . 

As a first-order approximation of the battery replacement 
costs, we may assume that the battery breaks down after 
a geometric number of operations (charging/discharging), at 
which point the battery needs replacement at costs C. Under 
these assumptions, the analysis presented in this paper largely 
holds, with some minor modifications. Assuming that battery 
lifetime has mean 1/q, the immediate costs can be rewritten 
as 

lx {8) = {d{x) + ((5) + C 1 + (S)~Vd)P (as) + qCl {sm - 

This new cost function is convex everywhere but in 5 = 0, 
and Lemma Q] and Theorem Q] must be modified accordingly. 

B. Self-discharge and varying efficiency 

We may extend the battery model by including the effects 
of self-discharge and time-varying efficiency. That is, every 
time slot we assume that some fraction £ of the stored 
energy dissipates, and the storage evolution (0 is modified 
accordingly: 

B(t + 1) = £B(t) + Vc(t)A 2 {t) - A 3 {t). 

The rest of the paper can be adjusted in a similar fashion, and 
all results and derivations continue to hold. The case £ = 1 
and rj c {t) = rj c corresponds to the present model. 

C. Local energy generation 

Another straightforward extension of our model is to assume 
that the end user itself generates some energy. This energy 
is either used to satisfy demand, is sold back to the grid or 
stored in the battery. This extension allows us to model for 
example wind energy farms, where energy storage can be used 
to successfully incorporate the increasing amount of renewable 
energy into the grid. Our results then extend to describe an 
optimal policy for the joint decision on buying, selling and 
storing (renewable) energy. 

In order to make these adjustments, we must reinterpret 
D(t) as the demand minus the amount of energy generated 
in a slot, and remove the constraints (f2]i that state that A\(t), 
A2(t) and Aa(t) are non-negative. 



VII. Conclusions 

In this paper we studied the control of end-user energy 
storage under price fluctuations. We derived the structure of the 
cost-minimizing storage policy, which turns out to be a simple 
threshold-based policy (|9). We described the behavior of these 
thresholds for some special cases, and showed by means of a 
numerical study that energy storage can lead to significant 
cost savings. We discussed various model extensions and 
generalizations that would broaden the scope of this work, 
without fundamentally altering the results. 

Appendix 

A. Proof of Lemma Q] 

Proof: Let J X}U (b) denote the minimal n-step dis- 
counted costs, starting from state x and battery level b. 
Let H x>n {p,b) = 7x(/3 ~b) + aG m , n (fi) and G x , n (B) = 
J fx(y)J y ,n{l3)dy, then J x>n (b) satisfies 

yen 



J x ,n(b) 



PtUnib) 



H, 



x,n— 1 



(P,b). 



(10) 



These finite-horizon costs converge as lim„_ i . 00 J xn = J x , 
cf. ifTTl Proposition 4.4]. Thus, in order to show that J x is 
convex and non-increasing, is suffices to show by induction 
that this holds for all J x<n . 

This statement trivially holds for n = 0, since J x ,o = 0. 
Now let n G IN, and assume that J x ,n-i is convex and non- 
increasing. The operator on the right-hand side of (TTOl can be 
identified as the infimal convolution operator, which preserves 
convexity |21, Theorem 5.4]. We have that Ga>,n-i is convex 
by the induction hypothesis, and since it can be readily verified 
that 73; is convex as well, so is J X:U . 

Let &i, &2 G B, b\ < b 2 , then in order to establish that J x „ 
is non-increasing, we need to show that J Xj7l (bi) > J xn (b2)- 
Denote by B x the choice for B he achieves the minimum 
in ([Tol l. We distinguish two cases: (i) j3 x (bi) G U x (b 2 ); and 
(ii) x (h) < U x {b 2 ). In case (i): 

JxAbz) < H x , n -l{f3 x {bl),b2) 

< 7»G%(&i) - h) + aG,,„_iC8;(6i)) = J„, n (&i), 

by the fact that j x is increasing. 

In case (ii) we have by the induction hypothesis 

J x ,n(h) < H Xin _ 1 (U-(b 2 ),b 2 ) 

= l x (-d(x)) + H x , n ^(U x (b 2 )) 

< iM{h) -bi) + ir iri „_i( j 8;(6i)) = J x ,n(h), 

completing the proof. ■ 

B. Proof of Theorem Q] 

Proof: Since 7 X and J x are convex, so is H x , and ([H) 
can be recognized as a convex optimization problem. Thus, 
0* € U x (b) is a global minimizer if and only if there exists 
some subgradient g G dH x (B* , b) such that for all 8 G U x (b), 

9(13* - 8) > 0. 



Thus, 8* G (U x (b),U+(b)) is a minimizer iff G 
dH x (B*,b), while /3* = U x (b) and /3* = E/+(b) are 
minimizers iff there exists some g G dH x (B* , 6) such that 
g > and g < 0, respectively. 

The subdifferentials of 7 X and G x are given by 

(r]dp(x), S < 0, 

[%p(x),r?- 1 p(a;)], 5 = 0, 
r?- x p(a;), 5 > 0, 

and 

9G a! (/3) = [a-(/3),cr+(/3)], 

for some — oo < <J x (/3) < cr x {@) < 0. The subdifferential of 
i7 x can then be written as (cf. 12T1 Theorem 23.8]): 

dH x ((3,b) 

[rjdp(x) + aa x (/3) , rj d p(x) + aa+ {(3)}, j3 < b, 

[r] d p(x) + a<j- (B),T 1 ^ 1 p(x) + aa+ (J))], 8 = 6, 
[f 1 - l p(x) + aa~(P),i 1 - 1 p(x) + aa+(l3)], 8 > b. 

Consequently, f3* = U~(b) is optimal if 

»tep(aO + aa+G8*)>0. 

Let 8* G (t/~(6),&), then /3* is optimal if 

r) d p(x) + aa x (B*) < < i ldP (x) + aa+(/3*). 
Let j3* = b, then 8* is optimal if 

%p(x) + aa x (n < < ^M*) + acr+(/3*)- 
Let /3* G (b, U+(b)), then /3* is optimal if 

V - l p(x) + aa x (B*) < < ^^(x) + aa+(B*). 
Finally, let 8* = U+(b), then 8* is optimal if 
7 ld p(x)+aa x (B*)<0. 
Denote 

^i". = {ft 6 [0, B] : ^^(x) + cwr+03) > 0}, 

S 2 - ffi = {/3 G [0, B] : ^^(x) + O0--C8) < 0}, 

K* = {P e [0, 5] : % p(x) + ckt+(/3) > 0}, 

B+ a = {B G [0, 5] : %p(x) + otr- (/3) < 0}, (11) 

and define 

mini?!-, if Br ^ 0, 



#;.= 



B, 



otherwise, 



= f maxB 2x , if B 2x ^g 
2 ' x 1 0, otherwise, 



#&,= 



minB+, ifB+ x ^0, 



B, 



otherwise, 



+ = f maxB+ x , if B+ x ^0, 
2 ' x 1 0, otherwise. 

Since <j~ and cr+ are non-decreasing (due to the fact that G x 
is convex and non-increasing), we have that B^ x < B 2x < 

/?L < Pi x - 

Then, for any B x e [B hx ,B 2 J and B+ G [Z?^,/?^], the 
policy (O is optimal. ■ 

Remark 1: Note that the proof of Theorem [TJ describes a 
continuum of optimal policies, since any choice for B~ G 
\Pi,a>i02,x\ and Pt e [^sc'^a;] defines a solution to Q. 



C. Proof of Corollary Q] 

Proof: In case r\ c = % = 1, we see that \fli x , 02 x ] = 
[fit xi Pt x\' anc ' me resu lt readily follows. ■ 

D. Proof of Proposition Q] 

Proof: We have that f) x = if for all /3 G C/ x (0), 

fl«(o,o)<fr B> 08,o) 

«=> d(x)p(x) - (d(x) + (3)p(x) 

+ af fm(v)(J v (P)-Jv(P))<0 

Jy£fl 

«• a / /.(y)(J tf (0) - J W C8)) < &(«). 

./yen 

This holds if J y (bi) - J v {b 2 ) < (b 2 - 6i)^ for all 6 X < 6 2 , 

In order to verify that (3 X = _B, we need to show that for 
all p G C/ X (B), 

H X (B,B)<H X (P,B) 
& d(x)p(a;) - (d(sc) + (B - /3)>O0 

+ «/ /«(y)(J , »(5)-J w (i8))<o 

./yen 



«• a / U(v)(Jy(P) - Jy{B)) >(B- p)p(x). 
./yen 

This holds if J y {bi)- J y (b 2 ) >(b 2 -bi)^- for all b x < 6 2 , 
y eft. m 



E. Proof of Proposition \2\ 

Proof: We will show that for all b\ , b 2 G £>, 6i < b 2 and 
a; G 0, 

J«(6i) - J»(fc) < (62 - 6i)P. (12) 

It then readily follows that in both cases (i) and (ii), the first 
condition of Proposition Q] is satisfied. 

In order to show that (TTZt holds, we apply the finite horizon 
framework presented in the proof of Lemma [T] and use 
induction on the horizon n. First, it is readily seen that 

J«,o(&i) - J x .o(b 2 ) = < (62 - 6i)P. 
Now assume that 

J x ,n-i(bi) - J«,„_i(6a) < (6a - &i)P- 
Then 

Jx,n(bl) — J x ,n(b 2 ) 

= ((P* x (b 1 )-b 1 )-(P* x (b 2 )-b 2 ))p(x) 



+ a / /«(v)(J w ,»-iC8:(6i)) - J„.n-i(/8;(&2)))dy 

Jy€fl 

<(mb 1 )-b l )-{Pl{b 2 )-b 2 ))P 
+ a(l3 x (b 2 )-P x (b 1 ))P = (b2-b 1 )P, 

completing the proof. I 



F. Proof of Proposition \3\ 

Proof: For i.i.d. prices we see that the future prices do 
not depend on the current state, i.e., G x = G. Consequently, 
we have that the subdifferential of G is also independent 
of x, and we write dG(/3) = [er~(/3),cr+ (/?)]. Now, the 
sets B^ x , B^~ x , B2 x and B^" x in dTTb only depend on x 
through p(x), and are non-increasing in p(x) due to the non- 
decreasingness of a~ and a + . Thus, the thresholds are non- 
increasing in p{x) as well. ■ 

G. Details of Example Q] 

We denote by Xi the state such that p(xi) = i, i = 1, . . . , 4. 
First, observe that in this example the control set does not 
depend on the battery level, and j3 x (b) = j3 x . Consequently, 
for all y G Q,, < h < b 2 < B, 



Jy(bi)-J y (b2) = (b2-bi)p(y). 



(13) 



By Proposition [2] we know that ($4 = 0. In order to show 
that /3i = 1, 02 = and 03 = 1, we need to demonstrate that 
(cf. Proposition [TJ: 

\{J*M - ^(i)) + \{J*M - Jx 3 (i)) 

>Pi(l-0)pi/a, (14) 

Jw 1 (O)-Jmiifi)<P20/a, (15) 

Jx 4 (/3)-Jx 4 (l)>p 3 (l-/3)/a, (16) 

respectively. 
By 03), 

^(^(z?) - j^a)) + i(Jx 3 (/3) - Jx 3 (i)) 

= 2(l- J 0)>(l-0)/a, 
which corresponds to (fl4l i. For /3 2 we have 

which satisfies (Q3). Finally, we see that also ( fTol l is satisfied 
since 

Jx 4 (/3) - J.J1) = (1 - P) P 4 > (1 - /?)p 3 /a. 
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